DAEDALUS at WebPS-3 2010: k-Medoids Clustering Using a Cost Function Minimization

نویسندگان

  • Sara Lana-Serrano
  • Julio Villena-Román
  • José Carlos González
چکیده

This paper describes the participation of DAEDALUS team at the WebPS-3 Task 1, regarding Web People Search. The focus of our research is to evaluate and compare the computational requirements and results achieved by different solutions based on the minimization of cost functions applied to clustering algorithms. Our clustering technique is based on an implementation of k-Medoids algorithm, run over a sparse term-document matrix built with the terms of the pages that are associated to each of the person names. We define an empty-cluster that holds all the individuals that are not part of any other cluster. Based on the results obtained, we can conclude that although clustering techniques play a very relevant role in the resolution of the problem of name homonymy in a set of web pages, there is a previous challenge still to solve: how to determine which contents are relevant for describing the person in that webpage, thus which are not part of the other navigational information contained in the webpage.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Spatial Clustering with Obstacles Constraints Based on PNPSO and K-Medoids

In this paper, we propose a novel Spatial Clustering with Obstacles Constraints (SCOC) based on Dynamic Piecewise Linear Chaotic Map and Dynamic Nonlinear Particle Swarm Optimization (PNPSO) and K-Medoids, which is called PNPKSCOC. The contrastive experiments show that PNPKSCOC is effective and has better practicalities, and it performs better than PSO K-Medoids SCOC in terms of quantization er...

متن کامل

Enhancing K-Means using class labels

Clustering is a relevant problem in machine learning where the main goal is to locate meaningful partitions of unlabeled data. In the case of labeled data, a related problem is supervised clustering, where the objective is to locate classuniform clusters. Most current approaches to supervised clustering optimize a score related to cluster purity with respect to class labels. In particular, we p...

متن کامل

Performance Evaluation of Partition Based Clustering Algorithms in Grid Environment Using Design of Experiments

Clustering is one of the most important research areas in the field of data mining. Clustering means creating groups of objects based on their features in such a way that the objects belonging to the same groups are similar and those belonging to different groups are dissimilar. Here K Means, K Medoids are basic partition based clustering algorithms. One of the disadvantages of using these algo...

متن کامل

Intrusion Detection based on a Novel Hybrid Learning Approach

Information security and Intrusion Detection System (IDS) plays a critical role in the Internet. IDS is an essential tool for detecting different kinds of attacks in a network and maintaining data integrity, confidentiality and system availability against possible threats. In this paper, a hybrid approach towards achieving high performance is proposed. In fact, the important goal of this paper ...

متن کامل

A K-means-like Algorithm for K-medoids Clustering

Clustering analysis is a descriptive task that seeks to identify homogeneous groups of objects based on the values of their attributes. This paper proposes a new algorithm for K-medoids clustering which runs like the K-means algorithm and tests several methods for selecting initial medoids. The proposed algorithm calculates the distance matrix once and uses it for finding new medoids at every i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010